All Categories :
Servers
Chapter 12
MIME and Helper Applications
CONTENTS
This chapter deals with some fundamentals for your Intranet. In
subsequent chapters, you'll use these building blocks to help
your customers access specific information in their everyday work.
This chapter assumes you already have a Web server up and running,
at least in a rudimentary fashion. If this is not the case, you'll
probably want to get your server up, so you can see and manipulate
the sample Web server and browser configuration files, which are
discussed here. Refer back to Part II of this book, "Getting
Set Up on the Server," and review Chapter 7 "Running
the Intranet Web Server."
The configuration information that comes with the IIS 2.0 Web
server software is fairly basic. It doesn't give you much in the
way of troubleshooting information, nor does it go very far to
explain the reasons behind certain aspects of the program. This
chapter fills in this information gap. For example, a handy little
section in the help file explains how to configure MIME types
in the Registry, but this chapter explains why you would want
to do that. This chapter also discusses the server MIME mappings
in the Registry, their meaning in Web technology, and how you
can use them to set up your Intranet, concepts that are central
to this book.
Web browsers like Netscape, Explorer,
and Mosaic are amazing packages. They not only enable you to search
the World Wide Web for interesting and useful documents, images,
and other data, but they also provide a friendly interface for
older Internet services such as FTP and Gopher. These browsers
can display not only plain text and HTML text (and HTML hyperlinks)
but also several common types of image files, even without any
helper applications. But even these amazing programs have their
limits. People use a mind-boggling array of different formats
to store their data on computers, and new formats are being invented
all the time. Your Web browser can't possibly handle all the existing
kinds of data, let alone the new formats just being invented.
That's where helper applications come in.
The pioneering developers of Web technology, scientists at CERN,
the European Particle Physics Lab, wanted to develop some means
of integrating various kinds of data into a single, user-friendly
interface. Toward this end, the folks at CERN made a critical
choice early in their work to allow a Web browser to call other
computer programs to handle data that it can't handle itself.
You probably know these other programs as helper applications,
though some people call them external viewers. Whatever
they're called (the term helper application is used in this book),
the decision to enable Web browsers to hand off data to a different,
outside program was sheer genius.
You might have already set up your browser to use helper applications
for viewing Web video or listening to Web sounds. What you may
not realize is that the mechanism for handing off data to helper
applications is a standardized one, and you can use it for almost
anything you can imagine. Helper applications aren't just for
viewing video clips, as the following examples demonstrate:
- Your everyday word processor can function as a helper application,
enabling you to distribute boilerplate documents with your Web
server.
- You can set up your spreadsheet program as a helper application
to enable your customers to download live data and then manipulate
it.
- You can use a presentation graphics package as a helper application
to open up computer-based training possibilities for your customers.
I'll come back to the specifics of setting up helper applications
later in this chapter. First, though, you need to understand the
mechanism by which Web browsers pass off data they can't handle
internally to external programs. This subject might seem a digression
from your Intranet, but understanding this subject is critical
to your success.
The original Web developers at CERN decided
on a single interface for a variety of data types. The developers
implemented this decision by adopting an existing mechanism called
Multipurpose Internet Mail Extensions or MIME.
As the name implies, MIME hails from the world of Internet electronic
mail. E-mail is one of the oldest Internet services, pre-dating
the World Wide Web by many years. E-mail is still one of the most
popular Internet services and is often given as the reason organizations
and people want Internet access. Despite its popularity, though,
Internet e-mail has been limited by the requirement that only
plain ASCII text can be used in messages. This requirement means
that nontext files, such as applications, data files that include
formatting (like word processor files), and other binary files,
can't be e-mailed as-is. It also means that even simple non-ASCII
characters, such as non-English characters used in many languages
around the world, won't pass e-mail muster.
As is often the case with computers and the Internet, there are
ways you can work around this limitation to get a binary data
file from one place to another intact. For example, you can use
the file transfer protocol (FTP) to transfer any kind of file
from one computer to another over the Internet. Also, if you've
used Internet e-mail to send data files very much, particularly
to or from UNIX systems, you may know about the uuencode
and uudecode programs. The uuencode program
converts a binary file into a specially encoded ASCII text file
so it can be sent by e-mail. Its companion utility, uudecode,
converts the encoded file back into its original format on the
recipient's end.
Neither of these workarounds is really convenient, though. Both
not only require extra steps, but also a certain amount of skill
and knowledge on the part of both the sender and receiver of the
messageÑskill and knowledge that the casual e-mail user
may not have. Sophisticated, user-friendly e-mail tools have developed
in the past few years, and most have point-and-click features
for attaching any kind of data file to a message. These tools
are easy-to-use and work well for exchanging nontext data, provided
both sender and recipient are using the same package.
Unfortunately, users of different proprietary-format e-mail programs,
such as a cc:Mail user and a Microsoft Mail user, can't easily
exchange data files through e-mail. Both cc:Mail and Microsoft
Mail use a proprietary message format. Although there are gateway
packages for both, they are expensive and don't always work well.
Although Lotus (manufacturer of cc:Mail) and Microsoft would have
you believe that the solution to these incompatibility problems
lies in your buying their packages for every user or buying
an expensive piece of e-mail gateway software and dedicating hardware
on which to run it, these solutions are inadequate in the context
of the Internet. These vendors might wish they could sell their
packages to every one of the millions of Internet e-mail users,
but this feat is unlikely. If you need to send Internet e-mail,
your fancy mail program's file attachment feature will break down
sooner or later.
In 1991, Nathaniel S. Borenstein of Bellcore
proposed major extensions to Internet electronic mail standards.
Called Multipurpose Internet Mail Extensions, or MIME,
Borenstein's proposal extended the existing Simple Mail Transport
Protocol (SMTP) standards to offer a "standardized way to
represent and encode a wide variety of media types, including
textual data in non-ASCII character sets, for transmission via
Internet mail."
The MIME proposal, which was issued as Internet Requests for Comments
(RFC) 1522 and 1523, amended earlier RFCs that defined the Simple
Mail Transport Protocol (primarily RFC 822) to allow the attachment
of virtually any kind of data file to an Internet e-mail message
using a simple mechanism.
NOTE |
The Internet has a long history of development through consensus. The TCP/IP networking protocols, developed at first with U.S. Government (Department of Defense) support, were worked out through give-and-take revolving around publicly proposed standards called Requests For Comments. Internet developers issued proposed standards for the nuts-and-bolts of the Internet, calling for comment from the then-small Internet community. Coordinated by the Internet Engineering Task Force (IETF), a process for building consensus for developing standards grew up, with feedback on RFCs eventually incorporated into the final standards the IETF issued. To date, more than 2,000 Internet RFCs have been issued. Many of them have made their way into final standards, guaranteeing that different vendors' TCP/IP networking applications can work together. You can find the complete set of Internet RFCs at http://www.internic.net/ds/dspg0intdoc.html.
RFC 822 defined the Simple Mail Transport Protocol. Anyone who wants to develop an Internet e-mail program can follow the requirements of RFC 822 to ensure the package works with all other RFC 822-compliant data file packages.
|
Under the terms of RFC 822, an Internet data file message has
two parts:
- A header, often likened to the envelope in which you mail
a letter at the post office, which contains addressing and postmark
information
- A body, like the information inside the envelope, which contains
the text of the message
This division of e-mail messages into a header section and a body
section is critical to MIME and, as you will see later in this
chapter, it is also important in World Wide Web services. Consequently,
this division will be important to you in setting up your Intranet.
A header itself can be divided into separate parts, each having
the same general format:
- A header name (From, To, Date, and so on) followed by a colon
and a single blank space. (Multiword header names, such as Reply-To,
are hyphenated.)
- The header content, such as the addressee's e-mail address,
time, and sender's e-mail address.
All headers in an e-mail message are single lines. Some headers
are required by RFC 822, and others are optional. The important
point is that they all follow the same format, with the colon
and a single blank space separating the header name and contents.
You will see additional headers on your own e-mail messages, including
what might be termed postmarks of all the Internet hosts that
handled your message on its way to you. Even so, all follow this
simple format, and the header section of all e-mail messages,
regardless of how many headers there are, is separated from the
body by a single blank line.
If you're interested in looking at the headers on your own e-mail
messages, many mail programs (including Eudora Light) have a setup
command to let you choose whether you want to see all the headers
on a message. You'll see that each message contains header and
body parts.
As noted, the broad division of Internet data
file messages into the header and body sections is always present
in the format just described. Borensteins' MIME proposal, also
grossly simplified here, was to extend this basic division by
doing the following:
- Adding a new header type that specified whether a message
was a multipart message, with some or no normal text and zero
or more attachments
- Enabling the data to be encoded into a special ASCII text
format, and then attached to the message body, with separating/identifying
information
You can read the details of MIME in RFC 1522 and RFC 1523. In
essence, the new header type allows one or more of a set of message
content types to be identified and attached to messages. The content
types include image, audio, video, application data, and, of course,
text. In addition, a special content type allows multiple attachments
of differing data types to the same message.
There are several MIME headers, including Mime-Version,
Content-Type, and Content-Length. You can read about
these headers in detail in the MIME RFCs. The important part to
note is that these are just additional headers that follow the
standard Internet e-mail message format.
MIME-capable mail user agents parse incoming messages for the
MIME-extended headers. Based on the content type of the message
and a set of user-configurable rules associating particular content
types with application programs (or viewers), the MIME mail program
passes attachments off to other application programs on the system
that are capable of dealing with them. For example, an incoming
MIME-formatted e-mail message may have an audio file attached.
The recipient's MIME-compliant mail tool recognizes the sound
file attachment from the extended headers in the message and fires
off an audio player to play the sound. Likewise, your Web browser
passes off data it cannot handle directly to helper applications
on your system that can handle the data.
Internet e-mail handling programs are usually
divided into two categories. First, users that are creating, sending,
and reading e-mail messages use Mail User Agents (MUAs).
Examples of Windows GUI MUAs include Exchange (free with Windows),
Eudora, and Pegasus (another very popular freeware application
available on the Internet).
Usually, however, a separate program does the work of routing
and delivering e-mail. These separate programs are usually
referred to as Mail Transport Agents (MTAs). However, PC MUAs,
like Eudora, generally have enough MTA features built in so that
the mail you create gets handed off immediately to a mail server
for delivery.
Similarly, MIME-compliant MUAs create MIME-formatted messages
automatically. Attaching a file is simple for the user; it's usually
a point-and-click operation in graphical MUAs, with the encoding
handled internally by the program.
Probably the most widely used MIME-compliant MUA is the PC and
Macintosh package called Eudora. It's a basic Internet e-mail
package with most of the standard MUA features, but it's also
MIME-compliant. A postcardware (meaning freeware if you
send its author, Jeff Beckley, a postcard) version of Eudora Light
is available on the CD-ROM that accompanies this book.
You know from using your WWW browser
that you can deal with many kinds of data and Internet services.
Your Web browser can display images, access Gopher and FTP services,
and, when properly equipped with helper applications, play movies
or audio that you find on the Web. Because you've set up a Web
server of your own, you also know you can make these and other
data types available on your server, and you know how to write
the HTML to include them in your Web pages. You may not know,
however, that the MIME mechanism just described is what makes
this all possible.
To help you understand this process, this section delves more
deeply into the details of MIME as it relates to Web servers,
browsers, and helper applications. You'll learn how Web servers
use MIME to distinguish among the types of data they're serving
and how Web servers use MIME to tell Web browser clients what
sort of data is being sent in every single transaction.
Web servers understand MIME information
and provide it to Web browsers in every HTTP transaction. As described
earlier in this chapter, MIME is able to identify a number of
data types (called content types in the MIME discussion earlier)
and subtypes. Web server software uses an extensive database of
MIME content type information. With IIS, this database is in the
Windows NT Registry underneath this key:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\InetInfo\Parameters\MimeMap
Figure 12.1 displays the Registry Editor opened to this key with
the MIME type for Microsoft Word documents selected in the right-hand
window.
Figure 12.1 : Editing the IIS MIME map using the Windows
NT Registry Editor.
The Layout of the Server MIME Map
IIS installs over 100 MIME mappings by default.
The syntax of each row in the Mime Map key is as follows:
<mime type>,<filename extension>,,<gopher type>
For example, the server uses the following line to tell the browser
that a .doc file is a Microsoft Word document:
application/msword,doc,,5
Notice that the <mime type> field is subdivided
into two parts by a forward slash. Remember from the discussion
of MIME earlier in the chapter that the proposed MIME standards
include a set of data types (content types) that can be attached
to e-mail messages. The <mime type> field represents
these very same data types. If you scroll down the window pane
on the right side of the Registry Editor window and look at just
the part of the <mime type> field before the slash,
you can see six data types:
- Application
- Audio
- Image
- Text
- Video
- X-world
Two other common MIME types, which IIS does not install automatically,
are the following:
These MIME types follow the conventions proposed in Nathaniel
Borenstein's MIME RFCs and are the same types supported by the
MIME-compliant e-mail packages listed earlier. Thus, this short
list of MIME data types is incorporated into your Web server.
Of course, different kinds of data can fall into these broad categories,
so the MIME data types are subdivided into MIME data subtypes.
The matter to the right of the slash in the MIME map signifies
subtypes of the major MIME data types. You're no doubt familiar
with several kinds of images, .gif, .jpeg, and
.bmp, for example. Thus, you'll see a number of entries
for the image data type, one each for the major image subtypes,
such as image/jpeg. Similarly, you'll see a couple of
different video and audio subtypes, including video/mpeg.
Perhaps the largest number of subtypes are those of the application
data type. As you can see from Figure 12.1, a large number of
well-known application programs are listed. These range from everyday
office word processors (like application/msword) to standard
UNIX utilities (like tar) to special purpose packages
(like PostScript). MIME provides support for all of these application
programs and the mechanism to use them. If you use these applications,
or any of the other applications listed in the MIME map, your
Web server knows about them, and you'll be able to put them to
work as a part of your Intranet by using the information in this
book.
Look at the remaining data on each row of the MIME map. The MIME
mechanism associates filename extensions with data types/subtypes.
The right side of each row contains a filename extension to be
associated with the MIME data type/subtype on the left side of
the row. For example, the entry for image/gif uses the
filename extension gif, and the entries for application/postscript
use several filename extensions: ai, ps, and
eps.
To put this another way, the MIME map helps Web browsers tie filename
extensions to specific computer programs. Your Web server knows,
from the MIME map, that a .doc file is a data file for
Microsoft Word, a .ps file is a PostScript document,
and an .mpeg file is an MPEG (Motion Picture Experts
Group) video movie. This is an important piece of information
for your Intranet because now your Web server can tell your clients
(that is, Explorer, Netscape, Mosaic, or another Web browser)
what sort of data is coming when your customers click a hyperlink.
Just as Web servers know about MIME
types and include the information in every piece of data they
send to Web browsers, the Web browsers understand MIME as well.
Web Servers Say What They're Sending
Web servers always precede anything they send
in response to a client request (for example, when you click a
hyperlink) with some preliminary header information. From the
discussion about MIME headers in the e-mail context, you can probably
guess that these headers contain MIME data type/subtype information.
Specifically, when a Web server responds to a request from a Web
browser for a document or other piece of data, the server announces
to the browser in one or more headers the type of data it is sending,
using the associations in the MIME map in the Registry. Thus,
when you click a hyperlink pointing to a video file (volcano.mpeg,
for example), the first bit of information sent back to your browser
about the link is its MIME type/subtype, video/mpeg.
Your browser, then, knows what sort of data is coming even before
it arrives.
Web Browsers Understand MIME Types, Too
Your Web browser understands MIME and
its data types/subtypes. Your browser reads the incoming MIME
type header information from the Web server and decides what to
do with the incoming data based on its type. For example, your
Web browser knows what to do with data of the MIME type text/html
(regular Web pages in HTML) or image/gif (a .gif image).
It has a built-in ability to properly handle these and other common
types of data. That's how you're able to read most documents you
find on the Web and see most images as well.
As noted at the beginning of this chapter,
Web browsers can't possibly handle all kinds of data. You already
know about common helper applications. What you might not know
is that the MIME information is intimately involved with these
helper applications. Web browsers use the MIME type header information
they get from Web servers, using the very same set of data type/subtype
and filename extension, to pass off the data to helper applications.
This process enables you to play Web movies or sound files. And
this process is how, as you'll learn in later chapters, you can
use MIME information to create your own associations between data
and your own helper applications for your Intranet.
A MIME Conversation
|
The following imaginary dialog between a Web browser and Web server, written in plain English instead of in the Hypertext Transfer Protocol (HTTP) using MIME headers, illustrates what happens when a user clicks an object that the browser can't show:
User (to the browser): Click, show me this object.
Browser (to the server): Send me the data this link points to.
Server: OK, but first you should know that it is of this MIME data type/subtype. Here it comes.
Browser (to itself): Ohhhh, it's that kind of MIME data type/subtype. Let's see, that means I can't display it myself, so I have to send it to a helper application that understands that data type. Let me look at my list. Which one handles this MIME data type/subtype? (Note that browsers are getting more and more sophisticated at handling multiple file types internally without having to pass the data to an external helper application.)
Browser (to the selected helper application): Here, deal with this data.
|
This section outlines the process of
setting up a Web server and browsers to use helper applications.
To focus on the general principles used, pretend you have a helper
application called PluPerStat. You don't need to know what this
program does or anything about the data it produces/uses, but
assume a couple of things about it:
- PluPerStat has some kind of proprietary data format.
- PluPerStat stores its files with the filename extension .plu.
Your first step in setting up PluPerStat
as a helper application for your Intranet is to edit the MIME
map in the Registry on your Web server to add an entry for it.
Not all Web servers store the MIME map in the NT Registry. See
your server documentation to make sure of the name and location
of the MIME map file if you are using some Web server other than
IIS.
Set Up the New Helper Application on Your Browser(s)
Before you can use PluPerStat as a helper
application, you need to tell your Web browser about it and its
MIME data type/subtype. Different browsers have different mechanisms
for adding helper applications. The following section covers Explorer.
If you're using another browser, check your documentation (if
necessary). You will probably realize as soon as you read the
steps for Explorer that the concepts can be easily applied to
any browser.
Setting up Internet Explorer for MIME
To set up Internet Explorer 2.0 to use the
imaginary PluPerStat data format, perform the following steps.
Note that Explorer uses the term file types rather
than helper application to accomplish the same purpose.
- Run Explorer and choose View | Options | File Types from the
main menu. You will see the dialog shown in Figure 12.2.
- Click the New Type button to open the dialog shown in Figure
12.3.
Figure 12.2 : The Internet Explorer File Types dialog.
- Fill in the boxes with the appropriate information. The one
thing you can't really do in this example is fill in the path
to the application that will be used to open this type of file
when this type of data is downloaded. But assuming you had a real
application in mind, you could click the New button and fill in
the path.
Figure 12.3 : This Internet Explorer dialog is used to
add MIME types.
- When you've finished, click OK, and then click OK again to
save the new MIME information.
Explorer is now configured to use PluPerStat as a helper application
whenever it encounters the MIME type/subtype application/x-pluperstat
or the filename extension .plu.
Careful readers will notice the first
sentence in the preceding paragraph says "whenever it encounters
the MIME type/subtype application/x-pluperstat or
the filename extension .plu." You may wonder why
it's necessary to include both the MIME type/subtype and filename
extension. After all, you've learned the Web server includes this
information in the MIME data type/subtype headers, so why does
Explorer (or any other Web browser) have to be configured to specify
both pieces of information?
Although this information is indeed redundant when communicating
with a Web server, Web browsers also communicate with other kinds
of Internet information servers, such as FTP and Gopher servers.
These Internet services pre-date both the World Wide Web and MIME;
they don't know anything about MIME types. Moreover, because they
send back only one kind of data, not one of many kinds of data
like a Web server, they have no reason to precede the data they
send with any identifying header information at all. Because an
FTP server, for example, has no way of telling a Web browser what
sort of data is coming, Web browsers use a workaround, keying
off the filename extension to a MIME data type/subtype. You've
no doubt seen your browser display a set of canned icons, representing
different file types, when you connect to an FTP or Gopher server.
These icons are your browser's MIME mechanism at work, using the
filename extensions it finds and a built-in list of MIME data
types/subtypes.
Thus, if you're connected to an FTP server with your Mosaic browser
and you click a link pointing to a file with the .plu
extension, Mosaic can make the assumption the file is a PluPerStat
data file because you've configured a helper application for this
kind of data. There's no guarantee, though, that the .plu
file is really a PluPerStat data file. After all, people are free
to name files anything they want.
The MIME map contains a semi-official list of MIME types and filename
extensions, and Web browsers are built to rely on that list. Although
you added a new type, application/x-pluperstat in the
example, to your browser, there's no guarantee that other Web
or Internet servers won't have used the same filename extension
for some other kind of data file. Still, the key point is that
Web browsers have a built-in list of filename extension/MIME type
associations to fall back on in the absence of any MIME header
information coming from the server.
The Common Gateway Interface is a standard
way of passing information from the Web fill-in forms you've seen
to back-end CGI scripts or other CGI programs that deal with the
data. CGI is described in detail in Chapter 19, "Getting
the Most out of HTML with CGI," and Chapter 20, "Building
a CGI Database System."
CGI is MIME-aware, which accounts for much of its power. CGI scripts
on your Intranet can return data from your Web server in response
to browser requests, in much the same way as you get data when
you click a hyperlink. Although most people think of the data
being returned from a Web server as being from static files on
the server (such as Web pages written in HTML, images, and so
on), CGI scripts and programs can generate data on the fly in
response to user requests. Such requests can be, for example,
based on a fill-in form. The user enters information in the form,
and then clicks a Submit button. The CGI script then processes
the information entered, generates a new stream of data based
on the user input, and returns it to the client. Thus, a fill-in
form can solicit input from a user such as search criteria in
a database application, construct a query using the user data,
run the query against the database, and return the results to
the user's Web browser as an HTML document.
The mechanics of this CGI on-the-fly data generation use the MIME
mechanism. Just as your Intranet server, sending back data in
response to a mouse click on a hyperlink, precedes that data with
header information containing the MIME data type/subtype of the
data to be sent, your CGI scripts must return the same sort of
information about the data stream they're about to send. Thus,
any Perl CGI script's very first output statements might be something
like the following:
print "Content-Type: text/html\n";
print "\n";
These statements give orders to generate the string of characters
Content-Type: text/html followed by a newline and print
a blank line. You've seen Content-Type before, just a few pages
back, as well as the necessary blank line. Recall the discussion
of the fundamental RFC-822 e-mail requirement: Messages must be
separated into a header area and a body area with a blank line
between them. What you have here is exactly the same: The CGI
script generates a MIME data type/subtype header (in this case
Content-Type: text/html) followed by a blank line as
the very first bit of data to be returned to the Web browser.
In this example, as required in all CGI scripts that are to return
data to the user's Web browser, the program informs the browser
that the forthcoming data is of the MIME type text and
subtype html. The rest of the data generated by the script
is, in fact, text data with HTML codes. Such data can include
any and all HTML markup, including URLs pointing to other Web
documents, images, or even other CGI scripts. Use of variable
substitution in CGI scripts, for example, can enable you to generate
documents, forms, or anything else that can be flagged in HTML,
all with the simple use of one MIME type/subtype header preceding
the data.
This simple, yet powerful, example uses the text/html
MIME type/subtype, but there is no reason your CGI scripts can't
return any other valid MIME type/subtype. Provided you've set
up your Web server's MIME map and your users' Web browsers have
corresponding helper application setup, there's almost no limit
to what you can return from your CGI scripts. For example, the
preceding Perl print statements could just as well be the following:
print "Content-Type: application/x-pluperstat\n";
print "\n";
Your script would then select and return a PluPerStat data file,
based on information the user enters into a fill-in form on your
Intranet. This way, you can make a library of PluPerStat data
available on your Web server, enable your customers to grab pieces
of it using their Web browsers, and then interact with the data
using the PluPerStat program itself. You've just made your Intranet
something more than just a look-at-pictures-and-read-text-files
server: Your customers can actually use it for their real work.
This chapter is the heart of this book. In
it, you've learned the following:
- What Web helper applications are
- What MIME is and where it came from
- How the developers of the World Wide Web adopted MIME as a
major part of Web technology
- How Web servers and browsers use MIME to identify and process
data
- The relationship between MIME and Web browser helper applications
- The basics of helper application setup in Explorer
- How MIME and the CGI mechanism work together
The next chapter continues the discussion of MIME by showing you
how to hook your office word processor into your Intranet. Later
chapters talk about your own application programs and apply the
information you've learned in this chapter to real programs that
do real work.

Contact
reference@developer.com with questions or comments.
Copyright 1998
EarthWeb Inc., All rights reserved.
PLEASE READ THE ACCEPTABLE USAGE STATEMENT.
Copyright 1998 Macmillan Computer Publishing. All rights reserved.